-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add 'crm sbd' sub-level (jsc#PED-8256) #1491
base: master
Are you sure you want to change the base?
Add 'crm sbd' sub-level (jsc#PED-8256) #1491
Conversation
338ed50
to
2f10c6e
Compare
Disk-based SBD scenarios1. Show usage when syntax error
More syntax errror cases 2. Completion
3. Display SBD related configuration (UC4 in PED-8256)
4. Change the on-disk meta data of the existing sbd disks (UC2.1 in PED-8256)
5. Add a sbd disk with the existing sbd configuration (UC2.2 in PED-8256)
6. Remove a sbd disk (UC2.3 in PED-8256)
7. Purge sbd from cluster
8. Replace the storage for a sbd disk (UC2.4 in PED-8256)
9. display status (focusing on the runtime information only) (UC5 in PED-8256)
10. overwrite case
Disk-less SBD scenarios1. Show usage when syntax error (diskless)
2. completion (diskless)
3. Display SBD related configuration (UC4 in PED-8256, diskless)
4. Manipulate the basic diskless sbd configuration (UC3.1 in PED-8256)
5. Remove diskless sbd from cluster
|
a19a863
to
cc0d52a
Compare
1456931
to
e8f53af
Compare
bc2a1fa
to
229de46
Compare
77c1c4f
to
5d17668
Compare
5d17668
to
ce84f84
Compare
…rties under diskless sbd
After adding sbd device interface to manage devices, related functionalities inside sbd configure interface should be adjusted
and make sure the metadata is consistent between devices.
Add a log message to indicate the start of pacemaker.service. This helps users understand that the system is not hanging but is actually starting pacemaker, especially when SBD_DELAY_START is set and it takes longer to start pacemaker.
to avoid duplicate info message.
to redirect stderr to stdout.
And the `sbd purge` command will also move /etc/sysconfig/sbd to /etc/sysconfig/sbd.bak on all nodes.
774ea69
to
79535f6
Compare
79535f6
to
2cbfc71
Compare
Add output of sbd process in
|
- Return immediately if no changes are made - Adjust watchdog timeout and msgwait values properly
5a33305
to
b9e9853
Compare
doc/crm.8.adoc
Outdated
............... | ||
# For disk-based SBD | ||
crm sbd configure show [disk_metadata|sysconfig|property] | ||
crm sbd configure [device=<dev>]... [watchdog-device=<dev>] [watchdog-timeout=<integer>] [allocate-timeout=<integer>] [loop-timeout=<integer>] [msgwait-timeout=<integer>] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I am confused by
crm sbd configure watchdog-timeout=...
. There are 3 similar items:Timeout (watchdog) :
,SBD_WATCHDOG_TIMEOUT=
andstonith-watchdog-timeout=
. Which ones are expected be modified by this command?
Two major scenarios:
- disk-based
Timeout (watchdog) :
in the disk metadata is used.SBD_WATCHDOG_TIMEOUT=
is useless - diskless
SBD_WATCHDOG_TIMEOUT=
andstonith-watchdog-timeout=
are meant to be used by diskless-sbd only
doc/crm.8.adoc
Outdated
............... | ||
# For disk-based SBD | ||
crm sbd configure show [disk_metadata|sysconfig|property] | ||
crm sbd configure [device=<dev>]... [watchdog-device=<dev>] [watchdog-timeout=<integer>] [allocate-timeout=<integer>] [loop-timeout=<integer>] [msgwait-timeout=<integer>] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If
Timeout (watchdog) :
andSBD_WATCHDOG_TIMEOUT=
controls the same thing, and only one of them is effective, we should show only the effective one incrm sbd configure show
, or indicate which one is effective in some way.
I'm kind of agree with you. Let's keep debating with Xin ;)
# To keep the order of devices during removal | ||
left_device_list = [dev for dev in self.device_list_from_config if dev not in devices_to_remove] | ||
if len(left_device_list) == 0: | ||
raise self.SyntaxError("Not allowed to remove all devices") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise self.SyntaxError("Not allowed to remove all devices") | |
raise self.SyntaxError("Not allowed to remove all devices. Run `crm cluster init sbd -S` to bootstrap the diskless-sbd") |
Intentionally, not give "-F" directly here, to have user think this twice. We can debate this.
b9e9853
to
1dedfe3
Compare
1dedfe3
to
a9636b9
Compare
return False | ||
if not sbd.SBDUtils.is_using_disk_based_sbd(): | ||
logger.error("Only works for disk-based SBD") | ||
logger.info("Please use 'crm cluster init -s <dev1> [-s <dev2> [-s <dev3>]]' to configure disk-based SBD first") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better to suggest using the SBD stage
logger.info("Please use 'crm cluster init -s <dev1> [-s <dev2> [-s <dev3>]]' to configure disk-based SBD first") | |
logger.info("Please use 'crm cluster init sbd -s <dev1> [-s <dev2>]' to configure the disk-based SBD first") |
for node in self.cluster_nodes: | ||
out = self.cluster_shell.get_stdout_or_raise_error(scripts_in_shell, node) | ||
if out: | ||
print(f"# Status of sbd process on {node}:") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print(f"# Status of sbd process on {node}:") | |
print(f"# Status of the sbd disk watcher process on {node}:") |
And, this information should no be printed out for the diskless-sbd.
) | ||
sbd_manager.init_and_deploy_sbd() | ||
|
||
def _configure_diskless(self, parameter_dict: dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the disk-based sbd, I expect to see TimeoutStartSec get updated along with crm sbd configure watchdog-timeout=50
, for example.
Sounds like, the diskless bootstrap code doesn't do this too.
Motivation
The main configurations for sbd use cases are scattered among sysconfig,
on-disk meta data, CIB, and even could be related to other OS components
eg. coredump, SCSI, multipath.
It's desirable to reduce the management complexity among them and to
streamline the workflow for the main use case scenarios.
Changed include
Disk-based SBD scenarios
Disk-less SBD scenarios